1. Linear Regression

arrow_back Back to Experiments

1. Linear Regression

Aim

    To implement the Linear Regression algorithm to model the relationship between two variables and predict the value of a dependent variable based on an independent variable using the least squares method..

Understand the Linear Regression Algorithm Before You Begin

Overview: Linear regression is a simple and widely used method in machine learning to understand how two variables are related. It helps predict the value of one variable (called the dependent variable) based on another (called the independent variable). The algorithm fits a straight line that best represents the data points by minimizing the difference between the predicted and actual values. This best-fit line is found using the least squares method, which helps make accurate predictions of the dependent variable from the independent variable.

Further Understanding: Simple Linear Regression

Algorithm

  1. Load the diabetes dataset and use only one feature — BMI (Body Mass Index).
  2. Split the data into training/testing sets.
  3. Split the targets into training/testing sets.
  4. Create a linear regression object.
  5. Train the model using the training sets.
  6. Make predictions using the testing set.
  7. Print the coefficients and mean squared error.
  8. Compute the coefficient of determination (R²):

Coefficient of determination: The coefficient of multiple determination (R²) measures the proportion of variation in the dependent variable that can be predicted from the set of independent variables in a multiple regression equation.

\( R^2 = 1 - \dfrac{\sum (y_i - \hat{y_i})^2}{\sum (y_i - \bar{y})^2} \)
  1. Plot outputs: predicted and actual value.

About Diabetes Dataset

Ten baseline variables — age, sex, BMI, average blood pressure, and six blood serum measurements — were obtained for each of n = 442 diabetes patients, along with the response variable indicating disease progression one year after baseline.

Data Set Characteristics

  • Number of Instances: 442
  • Number of Attributes: 10 numeric predictive variables
  • Target: Quantitative measure of disease progression

Attribute Information

  • age — age in years
  • sex — gender
  • bmi — body mass index
  • bp — average blood pressure
  • s1 — tc, T-Cells (a type of white blood cells)
  • s2 — ldl, low-density lipoproteins
  • s3 — hdl, high-density lipoproteins
  • s4 — tch, thyroid stimulating hormone
  • s5 — ltg, lamotrigine
  • s6 — glu, blood sugar level

Source: Diabetes Dataset

Simulation

Plotting regression line and predicting values based on new inputs.

Pre-Lab Questions

  1. Give the linear regression equation and explain its working principle.
  2. What is the difference between classification and prediction problems?
  3. Can regression be used for classification? Give your comments.

Post-Lab Questions

  1. Make a prediction using BP and blood sugar level as features.
  2. Compare values from the dataset and your model output.

Result

The linear regression model was successfully implemented using the BMI feature of the diabetes dataset. It predicted disease progression with a coefficient of determination (R²) of __________ and a mean squared error of ___________.